Neural Networks

This is an jupyter notebook. Lectures about Python, useful both for beginners and experts, can be found at http://scipy-lectures.github.io.

Open the notebook by (1) copying this file into a directory, (2) in that directory typing jupyter-notebook and (3) selecting the notebook.


Written By: Riddhish Bhalodia


In this exercise, we will learn about different neural network concepts. There are few prerequisites of probability and machine learning.

The Perceptron Algorithm

It is one of the example of a linear discriminant model and used for two-class clustering / separation. In this model the input bector x is transformed using a fixed non-linear transformation. So starting from generalized model of linear regression we have

$$ y(\textbf{x}) = \textbf{w}^T\phi(\textbf{x})$$

Now in perceptron all we do is pass this linear regression model through a non-linear activation function as follows

$$y(\textbf{x}) = f(\textbf{w}^T\phi(\textbf{x})) \quad \quad \quad (1)$$

Here, $f(.)$ is given by $$ f(a) = \left\{ \begin{array}{ll} -1 & \quad a < 0 \\ 1 & \quad a \geq 0 \end{array} \right. $$

Now, as we have two classes $\mathcal{C}_1$ and $\mathcal{C}_2$ so we define a target variable t which takes the values +1 and -1 for $\mathcal{C}_1$ and $\mathcal{C}_2$ respectively. Now we need to determine the parameters w, for that we need to define an error function which we have to minimize.

A natural choice for the error function is total number of misclassified patterns, however this causes some problems in the learning algorithm. Hence we propose an alternate error function called the perceptron criterion given by

$$ E_p(\textbf{w}) = - \sum \limits _{n \in \mathcal{M}} \textbf{w}^T \phi (\textbf{x}_n) t_n \quad \quad \quad (2)$$

Here, $\mathcal{M}$ denotes the set of all the misclassified patterns, the reasoning behind this functional can be found Christopher M Bishop's book here :D

Trivial example

Here we will simplate a trivial example in the case of a 2D data within the space [-1,1] x [-1,1], and we will asumme that $\phi(x_n) = x_n \quad \forall x_n$. We first need to generate the data


In [27]:
%matplotlib inline

import numpy as np
import scipy as sp
import matplotlib.pyplot as plt

# now we genrate the data
N = 30
x = np.zeros(N, dtype=np.float64)
y = np.zeros(N, dtype=np.float64)
for k in range(N):
    x[k], y[k] = [np.random.uniform(-1,1) for i in range(2)]
    
a = np.random.uniform(-1,1)
b = np.random.uniform(-1,1)
c = np.random.uniform(-1,1)
label = np.ones(N) # stores the labels for two classes, 1 for C1 and -1 for C2
xa = []
ya = []
xb = []
yb = []
N1 = 0
N2 = 0

# the random line divides the points into two classes of size N1 and N2
for k in range(N):
    temp = a*x[k] + b*y[k] + c
    if temp > 0:
        xa.append(x[k])
        ya.append(y[k])
        N1 += 1
    else:
        label[k] = -1
        xb.append(x[k])
        yb.append(y[k])
        N2 += 1

Now we plot the two classes as a scatter plot!


In [28]:
plt.scatter(xa, ya, color = 'b')
plt.hold(True)
plt.scatter(xb, yb, color = 'r')
plt.title('Scatter plot of the data, N = 30')


Out[28]:
<matplotlib.text.Text at 0x1130c0c10>

Now we want to classify this synthetic data using the perceptron model which will be trained using this data, and then we will test using the same data (this is called, self classification test). To proceed further we first need to train our perceptron model using the theory above.

Here the dimention of weight vector $\textbf{w}$ is 3 (as we just need to estimate a line). So we initilize the parameters as ones.


In [49]:
w = np.ones(3, dtype=np.float64) # the weights
iter_max = 100 # maximum number of iterations
error = 100.0 # randomly initilize the classification error
it = 0 # variable to store the iteration number
eta = 0.02 # the step size (try varying this)
classified_labels = np.ones(N)

Now how do we solve for the parameters. Easy, we apply simple gradient descent on the objective function (the function of the parameters to be estimated, which is to be minimized). So we take the derivative of the equation (2) and we get

$$ \textbf{w}^{(l+1)} = \textbf{w}^{(l)} + \eta \sum \limits_{n \in \mathcal{M}} \phi (\textbf{x}_n) t_n $$

So now we start coding the actual parameter estimation part.


In [50]:
while (error != 0 and it < iter_max):
    print(it)
    # Update Rules
    temp_vec = np.zeros(3, dtype=np.float64)
    for i in range(N):
        if label[i] != classified_labels[i]:
            temp += eta * np.array([x[i], y[i], 1]) * label[i]
    
    w += temp
    # recompute the classification
    for i in range(N):
        temp = w[0]*x[i] + w[1]*y[i] + w[2]
        if temp > 0:
            classified_labels[i] = 1
        else:
            classified_labels[i] = -1
            
    # compute the misclassification error
    error = 0
    for i in range(N):
        temp = w[0]*x[i] + w[1]*y[i] + w[2]
        if label[i] != classified_labels[i]:
            error += - label[i] * temp
    it +=1


0
30.4324768092
1
2.28166531608
2
0

In [53]:
x = np.linspace(-1,1,100)
y = -(w[0] * x + w[2]) / w[1]
plt.scatter(xa, ya, color = 'b')
plt.hold(True)
plt.scatter(xb, yb, color = 'r')
plt.plot(x,y, color='k')
plt.title('Perceptron classified data (the line)')


Out[53]:
<matplotlib.text.Text at 0x1134da610>

We can see that this perceptron model classifies the data very well :) lest check how close the weights are to the actual line we took to generate the data


In [57]:
x = np.linspace(-1,1,100)
y = -(w[0] * x + w[2]) / w[1]
plt.plot(x,y,color='b')
x = np.linspace(-1,1,100)
y = -(a * x + c) / b
plt.hold(True)
plt.plot(x,y,color='r')
plt.legend(['predicted', 'original'])


Out[57]:
<matplotlib.legend.Legend at 0x1137a0990>

Try changing the N and see how the prediction changes! Now we will move on to see how this will help us in neural networks.

Multilayer Perceptron

Brief Intro to Neural Networks

In very short and sweet terms neural networks aims at making mathematical constructs for information processing mimicing biological systems. Disregarding several constraints that actual biological construct gives us, we can use the core idea of neural networks for many pattern recognition applications, the following aspects of the theory are taken from Christopher Bishop's book.

The most basic..

The multilayer perceptron that we are going to discuss now is one of the most simplest and widely used models for neural networks, it's also known as Feed Forward Networks. The goal is to extend the regression model by making the $\phi (x_j)$ depend on some parameters which are to be estimated along with the set of weights $\{w\}_j$. So by (3) we now use simple linear comination to describe the neural network in terms of series of functional transformations

(A) M linear combinations of the inputs $(x_1,...,x_D)$ to generate activations

$$a_j = \sum \limits_{i=1}^D w_{ji}^{(1)}x_i + w_{j0}^{(1)} \quad \quad \quad (3)$$

The superscript (1) denotes the first layer of the neural network.

(B) Transformation of activations by a differentiable, non-linear activation function h(.)

$$z_i = h(a_i) \quad \quad \quad (4)$$

these $z_i$ are the hidden units and the two equations (3) and (4) combine to what looks like a perceptron model.

(C) Similar to inputs we have output activations which are constructed by linearly combining the hidden units

$$a_k = \sum \limits_{j=1}^{M} w_{kj}^{(2)}z_j + w_{k0}^{(2)} \quad \quad \quad (5)$$

(D) Lastly the output activations passed through a non-linear activation function gives the network outputs

$$y_k = \sigma (a_k) \quad \quad \quad (6)$$

These steps can be represented as the following figure (Taken from Pattern Recognition and Machine Learning by Christopher Bishop)

The model is forward propogating, and once we put the $x_0 = 1$ and join the bias weight to the set of weights we can see why this looks like a multilayer perceptron model. The key difference is that hidded units use continious (sigmoidal) non-linearities which makes the overall network function (by combimining all the steps) differetiable wrt the parameters, which helps in smooth training as we will see further.


In [ ]: